{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"\n",
    "You can run either this notebook locally (if you have all the dependencies and a GPU) or on Google Colab.\n",
    "\n",
    "Instructions for setting up Colab are as follows:\n",
    "1. Open a new Python 3 notebook.\n",
    "2. Import this notebook from GitHub (File -> Upload Notebook -> \"GITHUB\" tab -> copy/paste GitHub URL)\n",
    "3. Connect to an instance with a GPU (Runtime -> Change runtime type -> select \"GPU\" for hardware accelerator)\n",
    "4. Run this cell to set up dependencies.\n",
    "\"\"\"\n",
    "# If you're using Google Colab and not running locally, run this cell.\n",
    "# !pip install wget\n",
    "# !pip install git+https://github.com/NVIDIA/apex.git\n",
    "# !pip install nemo_toolkit[nlp]\n",
    "# !pip install unidecode\n",
    "import os\n",
    "import nemo\n",
    "import nemo.collections.nlp as nemo_nlp\n",
    "import numpy as np\n",
    "import time\n",
    "import errno\n",
    "import json\n",
    "\n",
    "from nemo.backends.pytorch.common.losses import CrossEntropyLossNM\n",
    "from nemo.collections.nlp.nm.data_layers import BertQuestionAnsweringDataLayer\n",
    "from nemo.collections.nlp.nm.trainables import TokenClassifier\n",
    "from nemo.collections.nlp.callbacks.qa_squad_callback import eval_epochs_done_callback, eval_iter_callback\n",
    "from nemo.utils.lr_policies import get_lr_policy\n",
    "from nemo import logging"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Introduction\n",
    "BioBERT has the same network architecture as the original BERT, but instead of Wikipedia and BookCorpus it is pretrained on PubMed, a large biomedical text corpus, which achieves better performance in biomedical downstream tasks, such as question answering(QA), named entity recognition(NER) and relationship extraction(RE). This model was trained for 1M steps. For more information please refer to the original paper https://academic.oup.com/bioinformatics/article/36/4/1234/5566506.  For details about BERT please refer to https://ngc.nvidia.com/catalog/models/nvidia:bertbaseuncasedfornemo.\n",
    "\n",
    "BioMegatron is an in house model, using Megatron https://github.com/NVIDIA/Megatron-LM pretrained on PubMed. The accuracy is better than using BioBERT on downstream tasks\n",
    "\n",
    "\n",
    "In this notebook we're going to showcase how to train BioBERT/BioMegatron on a biomedical question answering (QA) dataset."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Download model  checkpoint\n",
    "Download BioBert/BioMegatron checkpoints finetuned on SQuADv1.1 from  NGC: https://ngc.nvidia.com/catalog/models. Alternatively, you can also download BioBert/BioMegatron checkpoints and do the finetuning on SQuADv1.1 locally. This will take some time. For this, follow instructions at https://ngc.nvidia.com/catalog/models/nvidia:bertbaseuncasedsquadv1. \n",
    "    Then, put the encoder weights at `./checkpoints/biobert/qa_squad/BERT.pt` or `./checkpoints/biomegatron/qa_squad/BERT.pt`, the model head weights at `./checkpoints/biobert/qa_squad/TokenClassifier.pt` or `./checkpoints/biomegatron/qa_squad/TokenClassifier.pt` and the model configuration file at `./checkpoints/biobert/qa_squad/bert_config.json` or `./checkpoints/biomegatron/qa_squad/bert_config.json`. \n",
    "For BioBERT download e.g. https://ngc.nvidia.com/catalog/models/nvidia:biobertbasecasedfornemo.\n",
    "For BioMegatron download e.g. https://ngc.nvidia.com/catalog/models/nvidia:biomegatron345muncased."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set which model to use.\n",
    "model_type=\"biobert\" # \"biomegatron\"\n",
    "base_checkpoint_path={'biobert': './checkpoints/biobert/qa_squad', 'biomegatron': './checkpoints/biomegatron/qa_squad'}\n",
    "pretrained_model_name={'biobert': 'bert-base-cased', 'biomegatron': 'megatron-bert-uncased'}\n",
    "do_lower_case={'biobert': False, 'biomegatron': True}\n",
    "work_dir={'biobert': 'output_bioasq_biobert', 'biomegatron': 'output_bioasq_biomegatron'}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# the checkpoints are available from NGC: https://ngc.nvidia.com/catalog/models\n",
    "CHECKPOINT_ENCODER = os.path.join(base_checkpoint_path[model_type], 'BERT.pt')\n",
    "CHECKPOINT_HEAD = os.path.join(base_checkpoint_path[model_type], 'TokenClassifier.pt')\n",
    "CHECKPOINT_CONFIG = os.path.join(base_checkpoint_path[model_type], 'bert_config.json')\n",
    "    \n",
    "if not os.path.exists(CHECKPOINT_ENCODER):\n",
    "    raise OSError(errno.ENOENT, os.strerror(errno.ENOENT), CHECKPOINT_ENCODER)\n",
    "\n",
    "if not os.path.exists(CHECKPOINT_HEAD):\n",
    "    raise OSError(errno.ENOENT, os.strerror(errno.ENOENT), CHECKPOINT_HEAD)\n",
    "    \n",
    "if not os.path.exists(CHECKPOINT_CONFIG):\n",
    "    raise OSError(errno.ENOENT, os.strerror(errno.ENOENT), CHECKPOINT_CONFIG)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Download training data\n",
    "You first need to download the QA dataset BioASQ 7B to ./datasets/bioasq. Before using the files in this repository, you must first register BioASQ website and download the [BioASQ Task B](http://participants-area.bioasq.org/Tasks/A/getData/) data.\n",
    "You can also download part of the data using ../question_answering/get_bioasq.py.\n",
    "However the test labels for 7B need to be downloaded from the official website.\n",
    "In the following we show an example for training and inference for 7B which is a superset of 6B."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_dir=\"./datasets\"\n",
    "dataset=\"BioASQ\"\n",
    "if not os.path.exists(f\"{data_dir}/{dataset}\"):\n",
    "    !python ../question_answering/get_bioasq.py --data_dir=$data_dir\n",
    "!ls -l $data_dir/$dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After the previous step, you should have a ./datasets/BioASQ folder that contains the following files:\n",
    "\n",
    "- 6B1_golden.json\n",
    "- 6B2_golden.json\n",
    "- 6B3_golden.json\n",
    "- 6B4_golden.json\n",
    "- 6B5_golden.json\n",
    "- BioASQ-6b/train/Full-Abstract/BioASQ-train-factoid-6b-full-annotated.json\n",
    "- BioASQ-6b/test/BioASQ-6b/test/Full-Abstract/BioASQ-test-factoid-6b-1.json\n",
    "- BioASQ-6b/test/BioASQ-6b/test/Full-Abstract/BioASQ-test-factoid-6b-2.json\n",
    "- BioASQ-6b/test/BioASQ-6b/test/Full-Abstract/BioASQ-test-factoid-6b-3.json\n",
    "- BioASQ-6b/test/BioASQ-6b/test/Full-Abstract/BioASQ-test-factoid-6b-4.json\n",
    "- BioASQ-6b/test/BioASQ-6b/test/Full-Abstract/BioASQ-test-factoid-6b-5.json\n",
    "- BioASQ-7b/train/Full-Abstract/BioASQ-train-factoid-7b-full-annotated.json\n",
    "- BioASQ-7b/test/BioASQ-7b/test/Full-Abstract/BioASQ-test-factoid-7b-1.json\n",
    "- BioASQ-7b/test/BioASQ-7b/test/Full-Abstract/BioASQ-test-factoid-7b-2.json\n",
    "- BioASQ-7b/test/BioASQ-7b/test/Full-Abstract/BioASQ-test-factoid-7b-3.json\n",
    "- BioASQ-7b/test/BioASQ-7b/test/Full-Abstract/BioASQ-test-factoid-7b-4.json\n",
    "- BioASQ-7b/test/BioASQ-7b/test/Full-Abstract/BioASQ-test-factoid-7b-5.json\n",
    "\n",
    "The format of the data described in NeMo docs."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Create Neural Modules"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model_checkpoint=CHECKPOINT_ENCODER # language model encoder file\n",
    "head_checkpoint=CHECKPOINT_HEAD # language model encoder file\n",
    "model_config=CHECKPOINT_CONFIG # model configuration file\n",
    "work_dir=work_dir[model_type]\n",
    "train_file=f\"{data_dir}/{dataset}/BioASQ-7b/train/Full-Abstract/BioASQ-train-factoid-7b-full-annotated.json\"\n",
    "doc_stride=128\n",
    "max_query_length=64\n",
    "max_seq_length=384\n",
    "batch_size=12\n",
    "version_2_with_negative=False\n",
    "\n",
    "nf = nemo.core.NeuralModuleFactory(\n",
    "    placement=nemo.core.DeviceType.GPU,\n",
    "    log_dir=work_dir\n",
    ")\n",
    "model = nemo_nlp.nm.trainables.get_pretrained_lm_model(\n",
    "        config=model_config, pretrained_model_name=pretrained_model_name[model_type], checkpoint=model_checkpoint\n",
    "    )\n",
    "tokenizer = nemo.collections.nlp.data.tokenizers.get_tokenizer(\n",
    "    tokenizer_name='nemobert',\n",
    "    pretrained_model_name=pretrained_model_name[model_type],\n",
    "    do_lower_case=do_lower_case[model_type]\n",
    ")\n",
    "hidden_size = model.hidden_size\n",
    "qa_head = TokenClassifier(\n",
    "    hidden_size=hidden_size, num_classes=2, num_layers=1, log_softmax=False, name=\"TokenClassifier\"\n",
    ")\n",
    "qa_head.restore_from(head_checkpoint)\n",
    "task_loss = nemo_nlp.nm.losses.SpanningLoss()\n",
    "# create training data layer, preprocessing takes a while. If you want to cache preprocessed data for future reuse use --use_cache=True\n",
    "# remember to delete the cache when you switch the tokenizer/model (BioBERT and BioMegatron use different tokenizers)\n",
    "train_data_layer = BertQuestionAnsweringDataLayer(\n",
    "    mode=\"train\",\n",
    "    tokenizer=tokenizer,\n",
    "    version_2_with_negative=version_2_with_negative,\n",
    "    data_file=train_file,\n",
    "    max_query_length=max_query_length,\n",
    "    max_seq_length=max_seq_length,\n",
    "    doc_stride=doc_stride,\n",
    "    batch_size=batch_size,\n",
    "    shuffle=True,\n",
    "    use_cache=True\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Creating Neural graph"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_data = train_data_layer()\n",
    "hidden_states = model(input_ids=train_data.input_ids, token_type_ids=train_data.input_type_ids, attention_mask=train_data.input_mask)\n",
    "qa_output = qa_head(hidden_states=hidden_states)\n",
    "loss = task_loss(logits=qa_output, start_positions=train_data.start_positions, end_positions=train_data.end_positions)\n",
    "# If you're training on multiple GPUs, this should be\n",
    "# len(train_data_layer) // (batch_size * batches_per_step * num_gpus)\n",
    "train_steps_per_epoch = len(train_data_layer) // batch_size\n",
    "logging.info(f\"doing {train_steps_per_epoch} steps per epoch\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Create Callbacks\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_callback = nemo.core.SimpleLossLoggerCallback(\n",
    "    tensors=[loss.loss],\n",
    "    print_func=lambda x: logging.info(\"Loss: {:.3f}\".format(x[0].item())),\n",
    "    get_tb_values=lambda x: [[\"loss\", x[0]]],\n",
    "    step_freq=100,\n",
    "    tb_writer=nf.tb_writer,\n",
    ")\n",
    "ckpt_callback = nemo.core.CheckpointCallback(\n",
    "    folder=nf.checkpoint_dir, epoch_freq=1, step_freq=-1\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Training\n",
    "this may take more than an hour."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "num_epochs=5\n",
    "lr=5e-6\n",
    "lr_warmup_proportion=0\n",
    "weight_decay=0\n",
    "lr_policy_fn = get_lr_policy(\"WarmupAnnealing\", total_steps=num_epochs * train_steps_per_epoch, warmup_ratio=lr_warmup_proportion\n",
    ")\n",
    "nf.reset_trainer()\n",
    "nf.train(\n",
    "    tensors_to_optimize=[loss.loss],\n",
    "    callbacks=[train_callback, ckpt_callback],\n",
    "    lr_policy=lr_policy_fn,\n",
    "    optimizer=\"adam_w\",\n",
    "    optimization_params={\"num_epochs\": num_epochs, \"lr\": lr, \"weight_decay\": weight_decay},\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Inference\n",
    "Do inference on test data 7b-1 7b-2 7b-3 7b-4 7b-5. Here we only show inference with 7b-4. Rerun the following cells with all 5 test sets to get all numbers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test_dataset=\"7b\"\n",
    "test_dataset_idx=\"4\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test_file=f\"{data_dir}/{dataset}/BioASQ-{test_dataset}/test/Full-Abstract/BioASQ-test-factoid-{test_dataset}-{test_dataset_idx}.json\"\n",
    "logging.info(f\"using test file {test_file}\")\n",
    "test_data_layer = BertQuestionAnsweringDataLayer(\n",
    "    mode=\"test\",\n",
    "    tokenizer=tokenizer,\n",
    "    version_2_with_negative=version_2_with_negative,\n",
    "    data_file=test_file,\n",
    "    max_query_length=max_query_length,\n",
    "    max_seq_length=max_seq_length,\n",
    "    doc_stride=doc_stride,\n",
    "    batch_size=1,\n",
    "    shuffle=False,\n",
    "    use_cache=True\n",
    ")\n",
    "\n",
    "# Creating Neural test graph\n",
    "test_data = test_data_layer()\n",
    "test_hidden_states = model(input_ids=test_data.input_ids, token_type_ids=test_data.input_type_ids, attention_mask=test_data.input_mask)\n",
    "test_qa_output = qa_head(hidden_states=test_hidden_states)\n",
    "test_tensors=[test_data.unique_ids, test_qa_output]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "n_best_size=20\n",
    "null_score_diff_threshold=0\n",
    "max_answer_length=30\n",
    "logging.info(f\"work dir {work_dir}, checkpoint dir {nf.checkpoint_dir}\")\n",
    "output_prediction_file=f\"{work_dir}/predictions.json\"\n",
    "output_nbest_file=f\"{work_dir}/nbest.json\"\n",
    "evaluated_tensors = nf.infer(\n",
    "    tensors=test_tensors, cache=False, offload_to_cpu=False, checkpoint_dir=nf.checkpoint_dir\n",
    ")\n",
    "unique_ids = []\n",
    "for t in evaluated_tensors[0]:\n",
    "    unique_ids.extend(t.tolist())\n",
    "logits = []\n",
    "for t in evaluated_tensors[1]:\n",
    "    logits.extend(t.tolist())\n",
    "start_logits, end_logits = np.split(np.asarray(logits), 2, axis=-1)\n",
    "(all_predictions, all_nbest, scores_diff) = test_data_layer.dataset.get_predictions(\n",
    "    unique_ids=unique_ids,\n",
    "    start_logits=start_logits,\n",
    "    end_logits=end_logits,\n",
    "    n_best_size=n_best_size,\n",
    "    max_answer_length=max_answer_length,\n",
    "    version_2_with_negative=version_2_with_negative,\n",
    "    null_score_diff_threshold=null_score_diff_threshold,\n",
    "    do_lower_case=do_lower_case[model_type],\n",
    ")\n",
    "with open(output_nbest_file, \"w\") as writer:\n",
    "    writer.write(json.dumps(all_nbest, indent=4) + \"\\n\")\n",
    "with open(output_prediction_file, \"w\") as writer:\n",
    "    writer.write(json.dumps(all_predictions, indent=4) + \"\\n\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 110,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "./datasets/BioASQ/BioASQ-7b/test/Full-Abstract/BioASQ-test-factoid-7b-4.json\n",
      "        {\n",
      "          \"context\": \"Construction of a natural panel of 11p11.2 deletions and further delineation of the critical region involved in Potocki-Shaffer syndrome. Potocki-Shaffer syndrome (PSS) is a contiguous gene deletion syndrome that results from haploinsufficiency of at least two genes within the short arm of chromosome 11[del(11)(p11.2p12)]. The clinical features of PSS can include developmental delay, mental retardation, multiple exostoses, parietal foramina, enlarged anterior fontanel, minor craniofacial anomalies, ophthalmologic anomalies, and genital abnormalities in males. We constructed a natural panel of 11p11.2-p13 deletions using cell lines from 10 affected individuals, fluorescence in situ hybridization (FISH), microsatellite analyses, and array-based comparative genomic hybridization (array CGH). We then compared the deletion sizes and clinical features between affected individuals. The full spectrum of PSS manifests when deletions are at least 2.1 Mb in size, spanning from D11S1393 to D11S1385/D11S1319 (44.6-46.7 Mb from the 11p terminus) and encompassing EXT2, responsible for multiple exostoses, and ALX4, causing parietal foramina. Yet one subject with parietal foramina whose deletion does not include ALX4 indicates that ALX4 in this subject may be rendered functionally haploinsufficient by a position effect. Based on comparative deletion mapping of eight individuals with the full PSS syndrome including mental retardation and two PSS families with no mental retardation, at least one gene related to mental retardation is likely located between D11S554 and D11S1385/D11S1319, 45.6-46.7 Mb from the 11p terminus.\", \n",
      "          \"qas\": [\n",
      "            {\n",
      "              \"question\": \"What the chromsomal location of the gene that is deleted in Potocki-Shaffer syndrome?\", \n",
      "              \"id\": \"5c72b7277c78d69471000073_001\"\n"
     ]
    }
   ],
   "source": [
    "# a test question example would be \n",
    "!echo $test_file\n",
    "!grep -B 5 \"5c72b7277c78d69471000073_001\" $test_file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 111,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "output_bioasq_biobert/nbest.json:    \"5c72b7277c78d69471000073_001\": [\r\n",
      "output_bioasq_biobert/nbest.json-        {\r\n",
      "output_bioasq_biobert/nbest.json-            \"text\": \"p11.2p12\",\r\n",
      "output_bioasq_biobert/nbest.json-            \"probability\": 0.3970165418689914,\r\n",
      "output_bioasq_biobert/nbest.json-            \"start_logit\": [\r\n",
      "output_bioasq_biobert/nbest.json-                5.539025783538818\r\n",
      "output_bioasq_biobert/nbest.json-            ],\r\n",
      "output_bioasq_biobert/nbest.json-            \"end_logit\": [\r\n",
      "output_bioasq_biobert/nbest.json-                6.180495738983154\r\n",
      "output_bioasq_biobert/nbest.json-            ]\r\n",
      "output_bioasq_biobert/nbest.json-        },\r\n",
      "output_bioasq_biobert/nbest.json-        {\r\n",
      "output_bioasq_biobert/nbest.json-            \"text\": \"p11.2p12)\",\r\n",
      "output_bioasq_biobert/nbest.json-            \"probability\": 0.1409910996956543,\r\n",
      "output_bioasq_biobert/nbest.json-            \"start_logit\": [\r\n",
      "output_bioasq_biobert/nbest.json-                5.539025783538818\r\n",
      "output_bioasq_biobert/nbest.json-            ],\r\n",
      "output_bioasq_biobert/nbest.json-            \"end_logit\": [\r\n",
      "output_bioasq_biobert/nbest.json-                5.145214557647705\r\n",
      "output_bioasq_biobert/nbest.json-            ]\r\n",
      "output_bioasq_biobert/nbest.json-        },\r\n"
     ]
    }
   ],
   "source": [
    "# the corresponding first 2 best answers of the n-best list prediction with probabilities.\n",
    "!grep -A 20 \"5c72b7277c78d69471000073_001\" $data_dir/$dataset/$prefix$suffix $output_nbest_file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 112,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "      \"exact_answer\": [\r\n",
      "        [\r\n",
      "          \"11p11.2p12\"\r\n",
      "        ]\r\n",
      "      ], \r\n",
      "      \"concepts\": [], \r\n",
      "      \"type\": \"factoid\", \r\n",
      "      \"id\": \"5c72b7277c78d69471000073\", \r\n"
     ]
    }
   ],
   "source": [
    "# the golden label can be found in this following file under \"exact_answer\". In this case, the it is equal to the prediction\n",
    "prefix=test_dataset.upper()\n",
    "suffix=f\"{test_dataset_idx}_golden.json\"\n",
    "!grep -B 7 \"5c72b7277c78d69471000073\" $data_dir/$dataset/$prefix$suffix"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Evaluate inference output with BioASQ metrics"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if not os.path.exists('bioasq-biobert'):\n",
    "    print(\"clone https://github.com/dmis-lab/bioasq-biobert.git\")\n",
    "    !git clone https://github.com/dmis-lab/bioasq-biobert.git && cd bioasq-biobert && git fetch origin pull/12/head:fix_indentation && git checkout fix_indentation && cd ..\n",
    "if not os.path.exists('Evaluation-Measures'):\n",
    "    print(\"clone https://github.com/BioASQ/Evaluation-Measures.git\")\n",
    "    !git clone https://github.com/BioASQ/Evaluation-Measures.git && git checkout cd93f3b8eb290c965d18ef466ee28a0bcf451e5d"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "transformed_nbest_dir=f\"{work_dir}/transformed_nbest\"\n",
    "!mkdir -p $transformed_nbest_dir\n",
    "!python bioasq-biobert/biocodes/transform_n2b_factoid.py --nbest_path=$output_nbest_file --output_path=$transformed_nbest_dir"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "prefix=test_dataset.upper()\n",
    "suffix=f\"{test_dataset_idx}_golden.json\"\n",
    "! java -Xmx10G -cp Evaluation-Measures/flat/BioASQEvaluation/dist/BioASQEvaluation.jar evaluation.EvaluatorTask1b -phaseB -e 5 $data_dir/$dataset/$prefix$suffix $transformed_nbest_dir/BioASQform_BioASQ-answer.json"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With the default hyper parameters the result for 7b-4 test factoid will look something like this for BioBERT:\n",
    "\n",
    "```0.0 0.45161290322580644 0.6774193548387096 0.5403225806451613 0.0 0.0 0.0 0.0 0.0 0.0```\n",
    "\n",
    "and for BioMegatron:\n",
    "\n",
    "```0.0 0.6470588235 0.8235294118 0.7254901961 0.0 0.0 0.0 0.0 0.0 0.0```\n",
    "\n",
    "where the second, third and fourth numbers will be strict accuracy (SAcc), lenient accuracy (LAcc) and mean reciprocal rank (MRR) for factoid\n",
    "questions respectively."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The class weighted average for 7B test factoid of all 5 tasks:\n",
    "\n",
    "| Model | SAcc | LAcc | MRR |\n",
    "| :---         |     :---:      |        :---:     |     :---: |\n",
    "|BioBERT   | 0.39     | 0.6 | 0.47   |\n",
    "|BioMegatron     | 0.48       | 0.64    |0.54|"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  },
  "pycharm": {
   "stem_cell": {
    "cell_type": "raw",
    "metadata": {
     "collapsed": false
    },
    "source": []
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}